120 research outputs found

    Ultra-tight GPS/IMU Integration based Long-Range Rocket Projectile Navigation

    Get PDF
    Accurate navigation is important for long-range rocket projectile’s precise striking. For getting a stable and high-performance navigation result, a ultra-tight global position system (GPS), inertial measuring unit integration (IMU)-based navigation approach is proposed. In this study, high-accuracy position information output from IMU in a short time to assist the carrier phase tracking in the GPS receiver, and then fused the output information of IMU and GPS based on federated filter. Meanwhile, introduced the cubature kalman filter as the local filter to replace the unscented kalman filter, and improved it with strong tracking principle, then, improved the federated filter with vector sharing theory. Lastly simulation was carried out based on the real ballistic data, from the estimation error statistic figure. The navigation accuracy of the proposed method is higher than traditional method.

    Scene Graph Generation with External Knowledge and Image Reconstruction

    Full text link
    Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc. However, existing datasets are biased in terms of object and relationship labels, or often come with noisy and missing annotations, which makes the development of a reliable scene graph prediction model very challenging. In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome these dataset issues. In particular, we extract commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability in scene graph generation. To address the bias of noisy object annotations, we introduce an auxiliary image reconstruction path to regularize the scene graph generation network. Extensive experiments show that our framework can generate better scene graphs, achieving the state-of-the-art performance on two benchmark datasets: Visual Relationship Detection and Visual Genome datasets.Comment: 10 pages, 5 figures, Accepted in CVPR 201

    Unpaired Image Captioning via Scene Graph Alignments

    Full text link
    Most of current image captioning models heavily rely on paired image-caption datasets. However, getting large scale image-caption paired data is labor-intensive and time-consuming. In this paper, we present a scene graph-based approach for unpaired image captioning. Our framework comprises an image scene graph generator, a sentence scene graph generator, a scene graph encoder, and a sentence decoder. Specifically, we first train the scene graph encoder and the sentence decoder on the text modality. To align the scene graphs between images and sentences, we propose an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality. Experimental results show that our proposed model can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin.Comment: Accepted in ICCV 201

    Beneficial Effects of Ethyl Pyruvate through Inhibiting High-Mobility Group Box 1 Expression and TLR4/NF-κB Pathway after Traumatic Brain Injury in the Rat

    Get PDF
    Ethyl pyruvate (EP) has demonstrated neuroprotective effects against acute brain injury through its anti-inflammatory action. The nuclear protein high-mobility group box 1 (HMGB1) can activate inflammatory pathways when released from dying cells. This study was designed to investigate the protective effects of EP against secondary brain injury in rats after Traumatic Brain Injury (TBI). Adult male rats were randomly divided into three groups: (1) Sham + vehicle group, (2) TBI + vehicle group, and (3) TBI + EP group (n = 30 per group). Right parietal cortical contusion was made by using a weight-dropping TBI method. In TBI + EP group, EP was administered intraperitoneally at a dosage of 75 mg/kg at 5 min, 1 and 6 h after TBI. Brain samples were harvested at 24 h after TBI. We found that EP treatment markedly inhibited the expressions of HMGB1 and TLR4, NF-κB DNA binding activity and inflammatory mediators, such as IL-1β, TNF-α and IL-6. Also, EP treatment significantly ameliorated beam walking performance, brain edema, and cortical apoptotic cell death. These results suggest that the protective effects of EP may be mediated by the reduction of HMGB1/TLR4/NF-κB-mediated inflammatory response in the injured rat brain

    Neural Point Process for Learning Spatiotemporal Event Dynamics

    Full text link
    Learning the dynamics of spatiotemporal events is a fundamental problem. Neural point processes enhance the expressivity of point process models with deep neural networks. However, most existing methods only consider temporal dynamics without spatial modeling. We propose Deep Spatiotemporal Point Process (\ours{}), a deep dynamics model that integrates spatiotemporal point processes. Our method is flexible, efficient, and can accurately forecast irregularly sampled events over space and time. The key construction of our approach is the nonparametric space-time intensity function, governed by a latent process. The intensity function enjoys closed form integration for the density. The latent process captures the uncertainty of the event sequence. We use amortized variational inference to infer the latent process with deep networks. Using synthetic datasets, we validate our model can accurately learn the true intensity function. On real-world benchmark datasets, our model demonstrates superior performance over state-of-the-art baselines. Our code and data can be found at the https://github.com/Rose-STL-Lab/DeepSTPP

    Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis

    Full text link
    Diffusion-based models have achieved state-of-the-art performance on text-to-image synthesis tasks. However, one critical limitation of these models is the low fidelity of generated images with respect to the text description, such as missing objects, mismatched attributes, and mislocated objects. One key reason for such inconsistencies is the inaccurate cross-attention to text in both the spatial dimension, which controls at what pixel region an object should appear, and the temporal dimension, which controls how different levels of details are added through the denoising steps. In this paper, we propose a new text-to-image algorithm that adds explicit control over spatial-temporal cross-attention in diffusion models. We first utilize a layout predictor to predict the pixel regions for objects mentioned in the text. We then impose spatial attention control by combining the attention over the entire text description and that over the local description of the particular object in the corresponding pixel region of that object. The temporal attention control is further added by allowing the combination weights to change at each denoising step, and the combination weights are optimized to ensure high fidelity between the image and the text. Experiments show that our method generates images with higher fidelity compared to diffusion-model-based baselines without fine-tuning the diffusion model. Our code is publicly available at https://github.com/UCSB-NLP-Chang/Diffusion-SpaceTime-Attn.Comment: 20 pages, 16 figure

    VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

    Full text link
    Vision and text have been fully explored in contemporary video-text foundational models, while other modalities such as audio and subtitles in videos have not received sufficient attention. In this paper, we resort to establish connections between multi-modality video tracks, including Vision, Audio, and Subtitle, and Text by exploring an automatically generated large-scale omni-modality video caption dataset called VAST-27M. Specifically, we first collect 27 million open-domain video clips and separately train a vision and an audio captioner to generate vision and audio captions. Then, we employ an off-the-shelf Large Language Model (LLM) to integrate the generated captions, together with subtitles and instructional prompts into omni-modality captions. Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA). Extensive experiments have been conducted to demonstrate the effectiveness of our proposed VAST-27M corpus and VAST foundation model. VAST achieves 22 new state-of-the-art results on various cross-modality benchmarks. Code, model and dataset will be released at https://github.com/TXH-mercury/VAST.Comment: 23 pages, 5 figure
    corecore